Methods for processing audio signals and circuit arrangements therefor

ABSTRACT

A method for processing audio signals is provided comprising outputting an audio signal; receiving the output audio signal via a first receiving path as a first received audio signal; receiving the output audio signal via a second receiving path as a second received audio signal; determining an echo suppression gain based on the first received audio signal and the second received audio signal; and filtering echo suppression of the audio signal based on the first received audio signal and the echo suppression gain.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Provisional Patent ApplicationSer. No. 61/645,652, which was filed May 11, 2012, and is incorporatedherein by reference in its entirety.

TECHNICAL FIELD

The present disclosure relates to methods for processing audio signalsand circuit arrangements for processing audio signals.

BACKGROUND

In physics, echo may be defined as the replica produced by thereflection of a wave in its surrounding environment. Such phenomenon mayoccur in speech telecommunications. In a telephone terminal, acousticecho is due to the coupling between the loudspeaker and the microphoneof the terminals. As a consequence, the microphone of the telephone notonly contains the useful speech signal but also contains echo. If noprocessing is done on the microphone path, the echo signal as well asthe near-end speech signals are transmitted to the far-end speaker andthe far-end speaker hears a delayed version of his/her own voice. Theannoyance due to hearing his/her own voice increases as the level of theecho signal is high and as the delay between the original signal and itsecho is high.

In order to guarantee a good speech quality, some processing may beimplemented on the microphone path before the transmission can takeplace. Acoustic echo cancellation algorithms have been largelyinvestigated in the recent years. Approaches to acoustic echocancellation may include an adaptive filter followed by an echopostfilter. The adaptive filter produces a replica of the acoustic path.This echo path estimate is then used to estimate the echo signal that ispicked up by the microphone. In practice, because of mismatch betweenthe echo path and its estimate, typically, some residual echo subsistsat the output of the adaptive filter. A postfilter is often used torender echo inaudible. Echo postfilters may include attenuation beinggain applied to the error signal from the adaptive echo cancelling. Forbetter double talk performances, this attenuation can be computed in thesubband or frequency domain. Nevertheless, performances of singlechannel echo cancellation may still be limited as there is typically atrade-off between echo suppression during echo-only periods and lowlevel distortion of near-end speech during double-talk periods.

Mobile terminals have historically been designed with one microphone.Hence echo postfiltering solutions used in mobile terminals have beendesigned and optimized on the base of one microphone observation.Additionally, these solutions may have limited performance in case oflow near-end signal to echo ratio (i.e. high echo compared to near-endspeech). This limited performance may result in high distortions in theprocessed near-end speech signals during double-talk periods andtherefore in bad communications quality.

Moreover, the single channel echo postfiltering problem has been tackledfor decades now and there appears to be no more room for majorimprovements regarding solutions to the echo postfilter, especially formobile terminals case where the computational complexity is somehowlimited (in comparison to video conferencing terminals for example).

Thus, efficient methods of echo postfiltering or echo suppression aredesirable.

SUMMARY

A method for processing audio signals is provided including outputtingan audio signal; receiving the output audio signal via a first receivingpath as a first received audio signal; receiving the output audio signalvia a second receiving path as a second received audio signal;determining an echo suppression gain based on the first received audiosignal and the second received audio signal; and filtering echosuppression of the audio signal based on the first received audio signaland the echo suppression gain.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings, like reference characters generally refer to the sameparts throughout the different views. The drawings are not necessarilyto scale, emphasis instead generally being placed upon illustrating theprinciples of the invention. In the following description, variousaspects are described with reference to the following drawings, inwhich:

FIG. 1 shows a flow diagram illustrating a method for processing audiosignals.

FIG. 2 shows a flow diagram illustrating a method for processing audiosignals.

FIG. 3 shows a circuit arrangement for processing audio signals.

FIG. 4 shows a circuit arrangement for processing audio signals.

FIG. 5 shows an exemplary system experiencing a dual channel echo.

FIG. 6 shows a signal model matching the physical interactions betweenthe acoustic sources and the transducers of the system.

FIG. 7(a) shows an example of frequency response of the acoustic pathbetween the loudspeaker and the microphones.

FIG. 7(b) shows an example of frequency response between the artificialhead's mouth and the microphones.

FIG. 8 shows a circuit for processing audio signals with echocancellation.

FIG. 9 shows a circuit for processing audio signals with echocancellation using one adaptive filter.

FIG. 10 shows a circuit including a device for echo postfiltering and asystem having transducers.

FIG. 11 shows a circuit including an alternative device for echopostfiltering and a system having transducers.

FIG. 12(a) shows the estimation error on the residual echo powerspectral density (PSD) during echo-only and double-talk periods.

FIG. 12(b) shows the echo return loss enhancement (ERLE) curves andspeech attenuation (SA) curves.

FIG. 12 (c) shows the measure of cepstral distance during double talk.

DETAILED DESCRIPTION

The following detailed description refers to the accompanying drawingsthat show, by way of illustration, specific details and aspects in whichthe invention may be practiced. These aspects are described insufficient details to enable those skilled in the art to practice theinvention. Other aspects may be utilized and structural, logical, andelectrical changes may be made without departing from the scope of theinvention. The various aspects are not necessarily mutually exclusive,as some aspects can be combined with one or more other aspects to formnew aspects.

Approaches to improve speech quality in mobile terminals may include theuse of multi-microphone terminals. Multi-microphone terminals mayadvantageously provide spatial information on the near-end acousticenvironment.

In some of the following examples, the dual-channel microphone echoproblem is specifically addressed. The adaptive echo cancellationproblem may still be solved with classic (or standard) adaptive filters,for example, the normalised least mean squares (NLMS) filters, inparticular, two adaptive filters (i.e., one for each microphone path)may be used.

Dual channel echo postfiltering may be provided.

For this the postfilter may use the multi-channel information to computethe power spectral density (PSD) and the echo suppression gains that areapplied on one of the error signals to achieve residual echosuppression. In various embodiments, the multi-channel architecture maynot necessarily require any beamforming and may keep a moderatecomputational complexity compared to classic (or standard) singlechannel echo postfiltering while improving echo suppression performance.

Any beamforming methods may be used in order to improve spatialinformation.

The dual-channel postfilter may be extended so as to be used with oneadaptive filter instead of two. The adaptive filter may be placed on themicrophone path on which the echo postfiltering takes place. This mayreduce the computational complexity of the echo processing scheme whilegaining the advantage of the dual-channel architecture.

Generally, a method for processing audio signals as illustrated in FIG.1 may be provided.

FIG. 1 shows a flow diagram 100.

The flow diagram 100 illustrates a method for processing audio signals.

In 101, an audio signal is output. For example, an audio signal may beoutput via a loudspeaker.

In 102, the output audio signal is received via a first receiving pathas a first received audio signal. For example, the audio signal may bereceived via a first microphone.

In 103, the output audio signal is received via a second receiving pathas a second received audio signal. For example, the output audio signalmay be received via a second microphone.

In 104, an echo suppression gain is determined based on the firstreceived audio signal and the second received audio signal.

In 105, echo suppression of the audio signal is filtered based on thefirst received audio signal and the echo suppression gain.

In this context, the term “ignored” may refer to not being taken intoconsideration. The term “determined” may refer to calculated, orestimated, or measured, for example.

In other words, a method of processing audio signal, or morespecifically, of performing echo cancellation and echo suppression maybe provided. The method may include an output signal from a transducerfor example, a loudspeaker producing a sound which then will bereflected back to the device and thereby produce echos which may becaptured by microphones along with a desired signal to be input intoseparate paths for processing. The combined signal (which may be acombination of the desired signal, the output signal and noise) in oneof the separated paths may be used to determine or obtain a value to beused on the combined signal in another path such that a resultant signalmay be obtained. The resultant signal may have echos (from the outputsignal) being suppressed and may be similar to the desired signal.

Echo suppression gain being determined based on the first audio signalmay include echo of the first audio signal being filtered to produce afirst echo error signal, the echo suppression gain being determinedbased on the first echo error signal and echo suppression of the audiosignal being filtered based on the first echo error signal.

The echo suppression gain may for example be determined based on anestimate of the residual echo power spectral density of the firstreceived signal (e.g. at the first microphone) and an estimate of thesignal-to-echo ratio of the first received signal.

Echo suppression gain being determined based on the second audio signalmay include echo of the second audio signal being filtered to produce asecond echo error signal and the echo suppression gain being determinedbased on the second echo error signal.

The residual echo power spectral density (PSD) of the first receivedsignal and the signal-to-echo ratio of the first received signal are forexample determined based on an estimation of a relative transferfunction characterizing (e.g. in frequency domain) the dependency of thesecond echo error signal from the audio signal that has been output inrelation to the dependency of the first echo error signal from the audiosignal that has been output.

The residual echo power spectral density of the first received signaland the signal-to-echo ratio of the first received signal may forexample further be determined based on an estimation of a relativetransfer function characterizing (e.g. in frequency domain) thedependency of the second echo error signal from a speech signal to thedependency of the first echo error signal from the speech signal.

The filtering of echo may include an adaptive echo filtering. Forexample, the first echo error signal is determined by subtracting afirst estimate of the echo present in the first received audio signalfrom the first received audio signal. Similarly, the second echo errorsignal is for example determined by subtracting a second estimate of theecho present in the second received audio signal from the secondreceived audio signal.

Filtering of the echo suppression of the audio signal may includeignoring the second received audio signal.

Outputting an audio signal may include outputting an audio signal via aloudspeaker.

Receiving the audio signal via a first receiving path as a first audiosignal for example includes receiving the audio signal via a firstmicrophone and receiving the output audio signal via a second receivingpath as a second received audio signal for example includes receivingthe output audio signal via a second microphone.

The method may further include a residual echo power being determinedbased on the first received audio signal and the second received audiosignal. The echo suppression gain may be determined based on theresidual echo power.

The multichannel audio signal information for example includesmultichannel echo filtering information of the received audio signals.

The filtering of echo for example includes an adaptive echo filtering.

Outputting an audio signal may include outputting an audio signal via aloudspeaker.

The received audio signals are for example received via at least a firstmicrophone and a second microphone.

The method as shown in the flow diagram 100 of FIG. 1 may furtherinclude at least one of the audio signal and the audio signal after echosuppression being beamformed.

In this context, the term “beamformed” or “beamforming” may generallyrefer to a signal processing technique used for directional signaltransmission or reception.

FIG. 2 shows a flow diagram 200.

The flow diagram 200 illustrates a method for processing audio signals.

In 201, an audio signal may be output. For example, an audio signal maybe output via a loudspeaker.

In 202, an echo suppression gain may be determined based on amultichannel audio signal information representing received audiosignals which are received via different receiving paths. For example,the received audio signals may be received via at least a firstmicrophone and a second microphone.

In 203, echo suppression of the audio signal may be filtered based onsingle-channel audio signal information representing a received audiosignal received via a single receiving path and the determined echosuppression gain. The determined echo suppression may be determinedbased on the multi-channel audio signal information.

The multichannel audio signal information may include echo filteringinformation of the received audio signals or multichannel echo filteringinformation of the received audio signals. The filtering of echo mayinclude an adaptive echo filtering.

The method as shown in the flow diagram 200 of FIG. 2 may furtherinclude at least one of the audio signal and the audio signal after echosuppression being beamformed.

Generally a circuit arrangement for processing audio signals asillustrated in FIG. 3 may be provided.

FIG. 3 shows a circuit arrangement 300 for processing audio signals.

The circuit arrangement for processing audio signals 300 includes anaudio signal output 301 for outputting an audio signal; a firstreceiving path 302 configured to receive the output audio signal 301 asa first received audio signal; a second receiving path 303 configured toreceive the output audio signal 301 as a second received audio signal; adeterminer 304 configured to determine an echo suppression gain based onthe first received audio signal and the second received audio signal;and an echo suppression filter 305 coupled to the first receiving path302 and the determiner 304 configured to filter echo suppression of theaudio signal based on the first received audio signal and the echosuppression gain.

For example, the echo suppression filter 305 may be configured to ignorethe second received audio signal when filtering the audio signal basedon the first echo error signal. The term “ignore” and “determine” maysimilarly be defined as above for the term “ignored” and “determined”,respectively.

The circuit arrangement 300 for example carries out a method asillustrated in FIG. 1.

The circuit arrangement for processing audio signals 300 may furtherinclude at least one echo filter, e.g. at least one adaptive echofilter, configured to filter the first received audio signal to producea first echo error signal. The determiner 304 may be configured todetermine the echo suppression gain based on the first echo error signaland the echo suppression filter 305 may be configured to filter echosuppression of the audio signal based on the first echo error signal.

The circuit arrangement 300 may further include at least one echo filterconfigured to filter the second audio signal to produce a second echoerror signal. The determiner 304 may be configured to determine the echosuppression gain based on the second echo error signal.

For example, the at least one echo filter may include an adaptive echofilter.

The circuit arrangement 300 may further include a loudspeaker connectedto the audio signal output 301.

The circuit arrangement 300 may further include a first microphoneconnected to the first receiving path 302 and a second microphoneconnected to the second receiving path 303.

The circuit arrangement 300 may further include a second determinerconfigured to determine a residual echo based on the first receivedaudio signal and the second received audio signal. The second determinermay be configured to determine a second echo suppression gain based onthe second received audio signal. The second determiner may use thefirst received audio signal and the second received audio signal todetermine the second echo suppression gain. The two determiners may useaudio signals received via different microphones. Furthermore, theoutput of the determiner and the second determiner may be applied to abeamforming circuit.

The circuit arrangement 300 may further include a beamformer configuredto beamform the audio signals and/or the echo suppression filtered audiosignals. For example, the beamformer may be used to beamform allmultichannel received signals.

FIG. 4 shows a circuit arrangement for processing audio signals 400.

The circuit arrangement 400 for processing audio signals may include anaudio signal output 401 for outputting an audio signal; a plurality ofreceiving paths 402 coupled to the audio signal output 401; a determiner403 coupled to the plurality of receiving paths 402 and configured todetermine an echo suppression gain based on a multichannel audio signalinformation representing a plurality of received audio signals receivedvia the receiving paths 402; and an echo suppression filter 404 coupledto at least one of the plurality of receiving paths 402 and thedeterminer 403 and configured to filter the audio signal based on asingle-channel audio signal information representing a received audiosignal received via a single receiving path and the echo suppressiongain.

The circuit arrangement 400 for example carries out a method asillustrated in FIG. 2.

For example, the multichannel audio signal information may include echofiltering information of the plurality of audio signals or multichannelecho filtering information of the plurality of audio signals.

The echo filter may include an adaptive echo filter.

The circuit arrangement 400 may further include a loudspeaker connectedto the audio signal output 401.

The circuit arrangement 400 may further include a first microphoneconnected to one receiving path of the plurality of receiving paths 402;and a second microphone connected to another receiving path of theplurality of receiving paths 402.

The circuit arrangement 400 may further include a beamformer configuredto beamform the audio signal and/or the echo suppression filtered audiosignal.

It should be noted that aspects and features described in context of themethod illustrated in FIG. 1 are analogously valid for the methodillustrated in FIG. 3 and circuits shown in FIGS. 2 and 4 and viceversa.

Computation rules for the echo and gain may be provided. Also a softwareimplementation or a hybrid implementation (partially in hardware andpartially in software) of the determination of the echo and the gain maybe provided.

Examples for the method illustrated by the flow diagram 100 and thecircuit arrangement 300 are described in the following in more detail.

An exemplary system experiencing dual channel echo is described in thefollowing.

FIG. 5 shows a schematic representation of the system.

An example of a terminal 500 equipped with one loudspeaker 501 and twomicrophones 502, 503 is illustrated in FIG. 5. One of the microphoneobservations may be considered as the primary observation and the otheras secondary observation. As shown in FIG. 5, the far-end speaker voiceis played by the loudspeaker 501 to the near-end speaker 504. Part ofthis loudspeaker signal may reflect in the near-end environment 505 andmay be later on picked up by both microphones 502, 503 as an echo 506.The coupling between the loudspeaker 501 and each microphone may defineone acoustic path: two acoustic paths for the two microphones 502, 503.

The microphones 502, 503 may record the near-end speaker voice or speechsignal 506 and eventually the background noise 508. The near-end speakervoice 507 may also reflect in the environment 505 before being picked upby the microphones 502, 503. Because both microphones 502, 503 may notnecessarily be placed at the same position, the acoustic path betweenthe near-end speaker and each microphone may have to be modeled. Itshould be appreciated that FIG. 5 does not present a limiting example ofthe microphones' positions and the microphones 502, 503 may be putdifferently on the terminal 500.

As an example, the microphones 502, 503 may be placed at corner regionsof the terminal 500. The loudspeaker 501 may be placed slightly closerto the one microphone 502 as compared to the other microphone 503. Assuch, it may be considered that the microphone 502 provides a secondarymicrophone signal (or observation) and the other microphone 503 providesa primary microphone signal (or observation).

In some examples, the terminal 500 may be a telecommunication terminalthat is equipped with one loudspeaker and two or more microphones.

It should also be appreciated that the terminal 500 may not be limitedto only a telecommunication terminal and that the terminal 500 may beextended to a laptop or a tablet, which may also experience echosuppression. The terminal 500 may also be a handsfree mobile terminal.

The signal model of the dual channel (DC) echo problem is schematized asshown in FIG. 6.

FIG. 6 shows a schematic representation of a signal model matching thephysical interactions between the acoustic sources and the transducersof the system as described in FIG. 5, illustrating how the primary andsecondary microphone signals are determined.

The primary and secondary microphone signals 600, 601 are provided bythe microphones 502, 503 and are denoted y_(p)(n) and y_(s)(n)respectively. The signals d_(p)(n) 602 and d_(s)(n) 603 represent theecho signal picked up by the primary and secondary microphone 502, 503respectively. Both are generated by the loudspeaker signal x(n) 604 ofthe loudspeaker 501 where h_(p|s)(n) are represented by convolutiveblocks 605, 606 accounting for the acoustic path between the loudspeaker501 and respective microphones 502, 503.

The signals s_(p)(n) 607 and s_(s)(n) 608 represent the near-end speechsignal picked up by the primary and secondary microphone 502, 503respectively. Both are generated by the near-end speech signal s(n) 609(or 507), where g_(p|s)(n) are represented by convolutive blocks 610,611 accounting for the acoustic path between the near-end speaker 504and the primary or secondary microphone 502, 503.

The primary microphone signal y_(p)(n) 600 is given by the sum, providedby summing block 612, of s_(p)(n) 607 and d_(p)(n) 602. The secondarymicrophone signal y_(s)(n) 601 is given by the sum, provided by summingblock 613, of s_(s)(n) 608 and d_(s)(n) 603.

In respect with signal model in FIG. 6, the following equations may bederived:y _(p)(n)=g _(p)(n)*s(n)+h _(p)(n)*x(n)  Eq. (1)y _(s)(n)=g _(s)(n)*s(n)+h _(s)(n)*x(n)  Eq. (2)where:

-   -   x(n) is the loudspeaker signal 604,    -   y_(p|s)(n) represents the primary or secondary microphone        signals 600, 601 respectively,    -   h_(p|s)(n) 605, 606 represents the acoustic path between the        loudspeaker 501 and the primary or secondary microphone 502, 503    -   s(n) 609 is the near-end speaker signal    -   g_(p|s)(n) 610, 611 represents the acoustic path between the        near-end speaker 504 and the primary or secondary microphone        502, 503    -   * represents the convolution operation.

In order to validate the signal model of FIG. 6, measurements of impulseresponses may be performed with a mock-up phone in different acousticenvironments. An artificial head (HEAD Acoustics HMS II.3) with mouthsimulator may be used to simulate a near-end speaker. Two differentpositions for the phone may be used: one where the phone may be placedat about 30 cm straight in front of the artificial head's mouth andanother where the phone may be placed on a table. Recordings may be madewith the phone being placed such that the two microphones of the phonemay be at equal distance of the artificial mouth. It is to be noted thatthe above may be applied to different acoustic environment (office,cabin, street, etc. . . .) and to any other communication device, e.g.in a handsfree mode as well as in a handset mode.

FIG. 7(a) shows an example of frequency responses of the acoustic pathsbetween the loudspeaker and the microphones. FIG. 7(a) shows that theloudspeaker signal received by the microphones is not equally attenuatedby the acoustic environment for each microphone. This may show thenecessity to encounter for these differences by considering two acousticecho paths (i.e. the acoustic echo path of the primary microphone 700and the acoustic echo path of the secondary microphone 701) in thesignal model of FIG. 6.

FIG. 7(b) shows an example of frequency responses between the artificialhead's mouth and the microphones. FIG. 7(b) shows that both impulseresponses (i.e. the acoustic echo path of the primary microphone 702 andthe acoustic echo path of the secondary microphone 703) are verysimilar. This similarities may be explained by the position of themicrophones compared to the artificial head mouth. For this reason, itmay be assumed that g_(p)(n)=g_(s)(n). Although the previous assumptionhelps to reduce the computational complexity it is to be noted that thisassumption is not necessary. Implementations may be done without thisassumption.

For achieving single channel (SC) echo cancellation, an echocancellation circuit 800 considered may include an adaptive filter part,which includes two adaptive filters 801, 802, followed by an echopostfilter 803 as shown in FIG. 8.

A “circuit” may be understood as any kind of a logic implementingentity, which may be special purpose circuitry or a processor executingsoftware stored in a memory, firmware, or any combination thereof. Thus,a “circuit” may be a hard-wired logic circuit or a programmable logiccircuit such as a programmable processor, e.g. a microprocessor (e.g. aComplex Instruction Set Computer (CISC) processor or a ReducedInstruction Set Computer (RISC) processor). A “circuit” may also be aprocessor executing software, e.g. any kind of computer program, e.g. acomputer program using a virtual machine code such as e.g. Java. Anyother kind of implementation of the respective functions which aredescribed may also be understood as a “circuit”. For example, thevarious components of the circuit arrangement such as the determiner maybe implemented by a circuit as described above.

FIG. 8 shows the circuit 800 for processing audio signals with echocancellation. The circuit 800 may include a system 804 of acousticsources (i.e. near-end speech 805, loudspeaker 808 signal giving rise toecho 806 and noise 807) and transducers (i.e. loudspeaker 808 and twomicrophones 809, 810). The system 804 may refer to the system 500 ofFIG. 5 and may be represented by the signal model as illustrated in FIG.6.

For each microphone 809, 810, the effect of echo may be considered to bethe same as in a SC echo cancellation. Therefore for each microphonesignal y_(p|s)(n) 811, 812, an estimate of the echo signal 813, 814 maybe obtained by the use of an adaptive filter 801, 802 as in the SC case.

Although in general, any adaptive echo cancellation process may beapplied, e.g. any as such known adaptive echo cancellation algorithm,the standard NLMS algorithm may be used to estimate the echo signals.

For the same reasons that in the SC case, some residual echo may bepresent in the error signals e_(p|s)(n) 815, 816 at the output of theacoustic echo cancellations (AECs). The error signal e_(p|s)(n) 815, 816may be obtained by the difference, provided by respective summing blocks817, 818, between the microphone signals y_(p|s)(n) 811, 812 and therespective estimates of the echo signals 813, 814. The postfilter 803may be used to achieve further echo suppression. The postfilter 803 mayinclude a filter update block 819 and an echo postfiltering block 820.The filter update block 819 produces an output 821 based on e_(p|s)(n)815, 816 and the loudspeaker signal x(n) 822 of the loudspeaker 808. Forexample in FIG. 8, this output 821 and e_(p)(n) 815 are input into theecho postfiltering block 820 to give an echo suppressed signal ŝ(n) 823.

The circuit 800 may refer to the circuit arrangement 300 of FIG. 3. Theloudspeaker signal x(n) 822 of the loudspeaker 808 may refer to theaudio signal output 301; y_(p)(n) 811 may refer to the first receivingpath 302; y_(s)(n) 812 may refer to the second receiving path 303; thefilter update block 819 may refer to the determiner 304; and the echopostfiltering block 820 may refer to the echo suppression filter 305.

In a similar manner, the circuit 800 may refer to the circuitarrangement 400 of FIG. 4. The loudspeaker signal x(n) 822 of theloudspeaker 808 may refer to the audio signal output 401; y_(p|s)(n)811, 812 may refer to the plurality of receiving paths 402; the filterupdate block 819 may refer to the determiner 403; and the echopostfiltering block 820 may refer to the echo suppression filter 404.

FIG. 9 shows an echo cancellation circuit 900 including an adaptivefilter part, which includes one adaptive filter 901, followed by an echopostfilter 902.

FIG. 9 shows the circuit 900 for processing audio signals with echocancellation using only one adaptive echo filter. The circuit 900 mayinclude a system 903 of acoustic sources (i.e. near-end speech 904,loudspeaker 907 signal and noise 906) and transducers (i.e. loudspeaker907 and two microphones 908, 909). The system 903 may refer to thesystem 500 of FIG. 5 and may be represented by the signal model asillustrated in FIG. 6.

In FIG. 9, the error signal e¹(n) 910 may be obtained by the difference,provided by a summing block 911, between the primary microphone signaly¹(n) 913 and an estimate of the echo signal 912. The estimate of theecho signal 912 may be obtained by having the loudspeaker signal x(n)914 going through the adaptive filter 901. The postfilter 902 may beused to achieve further echo suppression. The postfilter 902 may includean echo power spectral density PSD and gain update block 915 and an echopostfiltering block 916. The echo PSD and gain update block 915 producesan output 917 based on e¹(n) 910, the secondary microphone signal y²(n)918 and the loudspeaker signal x(n) 914 of the loudspeaker 907. Forexample in FIG. 9, this output 917 and e¹(n) 910 are input into the echopostfiltering block 916 to give an echo suppressed signal ŝ(n) 919,which may also be understood as an estimate of the near-end speechsignal s(n) 904. It is to be noted that the echo power spectral densityPSD and gain update block 915 may be equal to filter update block 819 asshown in FIG. 8.

The circuit 900 may refer to the circuit arrangement 300 of FIG. 3. Theloudspeaker signal x(n) 914 of the loudspeaker 907 may refer to theaudio signal output 301; y¹(n) 913 may refer to the first receiving path302; y²(n) 918 may refer to the second receiving path 303; the echo PSDand gain update block 915 may refer to the determiner 304; and the echopostfiltering block 916 may refer to the echo suppression filter 305.

In a similar manner, the circuit 900 may refer to the circuitarrangement 400 of FIG. 4. The loudspeaker signal x(n) 914 of theloudspeaker 907 may refer to the audio signal output 401; y^(1|2)(n)913, 918 may refer to the plurality of receiving paths 402; the echo PSDand gain update block 915 may refer to the determiner 403; and the echopostfiltering block 916 may refer to the echo suppression filter 404.

Generally, the circuit 900 may function in a similar manner as thecircuit 800 of FIG. 8 with the except that only one adaptive filter 901is used in the circuit 900. Using only one adaptive filter 901 mayreduce the computational complexity of the multi-channel echopostfilter. The use of one adaptive filter 901 may also be advantageousas the computation of the spectral gains may benefit from the highcorrelation between the loudspeaker signal x(n) 914 and the othermicrophone signals (y²(n) 918 for the example of FIG. 9) although theecho suppression itself is applied on a signal with reduced echo(e¹(n)910 for the example of FIG. 9).

The circuits 800, 900 may be extended to multi-channel m. Inmulti-channels including a plurality of receiving paths, for example theplurality of receiving paths 402 of FIG. 4, x(n) is the loudspeakersignal, y^(m)(n) represents the m^(th) microphone signal with m rangingfrom 1 to M, the number of microphones of the terminal. Each microphonesignal contains echo d^(m)(n) and near-end speech signal s^(m)(n),h^(m)(n) is the acoustic path between the loudspeaker and the m^(th)microphone such that d^(m)(n)=h^(m)(n)*x(n), ĥ^(m)(n) is the estimate ofh^(m)(n), e^(m)(n)=y^(m)(n)−{circumflex over (d)}^(m)(n) is the errorsignal from the adaptive filtering for the m^(th) microphone signal.When only one adaptive filter is used as in FIG. 9, e^(m)(n)=y^(m)(n)for m≧2. g^(m)(n) is the acoustic path between the near-end speaker andthe m^(th) microphone such that s^(m)(n)=g^(m)(n)*s(n), ŝ(n) is theoutput of the postfilter, that is an estimate of the near-end speechs(n).

For example, the echo suppression may still be applied only to theprimary microphone path. This means existing SC echo suppression gainrules may still be used. The computation of a gain rule may generallyrequire estimates of the residual echo PSD and of the near-end PSD. Forexample, the following gain rules may be used:

$\begin{matrix}{{W_{1}\left( {k,i} \right)} = \frac{\Phi^{S_{p}S_{p}}\left( {k,i} \right)}{{\Phi^{S_{p}S_{p}}\left( {k,i} \right)} + {\Phi^{{\overset{\sim}{D}}_{p}{\overset{\sim}{D}}_{p}}\left( {k,i} \right)}}} & {{Eq}.\mspace{14mu}(3)} \\{{W_{2}\left( {k,i} \right)} = \frac{{SER}\left( {k,i} \right)}{1 + {{SER}\left( {k,i} \right)}}} & {{Eq}.\mspace{14mu}(4)}\end{matrix}$where Φ^(S) ^(p) ^(S) ^(P) (k,i) is the PSD of the near-end speech,Φ^({tilde over (D)}) ^(p) ^({tilde over (D)}) ^(p) (k,i) is the PSD ofthe residual echo at the primary microphone and SER(k,i)=Φ^(S) ^(p) ^(S)^(P) (k,i)/Φ^({tilde over (D)}) ^(p) ^({tilde over (D)}) ^(p) (k,i) isthe signal-to-echo ratio (SER) at the primary microphone. However, it isto be noted that any kind of gain rule may be used that uses or requiresan estimate of the near-end speech PSD and/or of the residual echo PSD.

It should be appreciated and understood that computing subband echopostfilter requires the estimation of the residual echo PSD and/or thenear-end speech signal PSD. A method to estimate the residual echo andnear-end speech PSDs in DC or multichannel case echo postfilteringproblem may be introduced.

New estimations of residual echo and near-end PSDs are described below.For dual channel or multichannel echo postfiltering the computation ofthese PSDs requires the knowledge of relative transfer functions (RTF).

A residual echo and near-end PSD estimate may be provided.

The difference for the computation of the residual echo and near-endPSDs lies in the use of at least two microphone signals instead of one.In the following example, the estimation of these PSDs for the dualchannel case is discussed.

Signals equations at the postfilter in the case of two adaptive filters:

(a) Error signals equations in the time domain

$\begin{matrix}\begin{matrix}{{e_{p}(n)} = {{y_{p}(n)} - {{\hat{d}}_{p}(n)}}} \\{= {{y_{p}(n)} - {{{\hat{h}}_{p}(n)}*{x(n)}}}} \\{= {{{g_{p}(n)}*{s(n)}} + {{\overset{\sim}{d}}_{p}(n)}}} \\{= {{{g_{p}(n)}*{s(n)}} + {{{\overset{\sim}{h}}_{p}(n)}*{x(n)}}}}\end{matrix} & {{Eq}.\mspace{14mu}(4)} \\\begin{matrix}{{e_{s}(n)} = {{y_{s}(n)} - {{\hat{d}}_{s}(n)}}} \\{= {{y_{s}(n)} - {{{\hat{h}}_{s}(n)}*{x(n)}}}} \\{= {{{g_{s}(n)}*{s(n)}} + {{\overset{\sim}{d}}_{s}(n)}}} \\{= {{{g_{s}(n)}*{s(n)}} + {{{\overset{\sim}{h}}_{s}(n)}*{x(n)}}}}\end{matrix} & {{Eq}.\mspace{14mu}(5)}\end{matrix}$where {tilde over (h)}_(p|s)(n)=h_(p|s)(n)−ĥ_(p|s)(n) represents theecho path misalignment vector.

(b) Error signals in the frequency domainE _(p)(k,i)=G _(p)(k,i)·S(k,i)+{tilde over (H)} _(p)(k,i)·X(k,i)  Eq.(6)E _(s)(k,i)=G _(s)(k,i)·S(k,i)+{tilde over (H)} _(s)(k,i)·X(k,i)  Eq.(7)where:

-   -   E_(p)(k,i) and E_(s)(k,i) are Fourier transform of the error        signals of the primary and secondary microphone, respectively    -   k and i respectively represent the frame and frequency bin        indexes

In the following, the frame and frequency indexes will be omitted forclarity purposes and will only be used when necessary.

(c) Residual echo signals auto- and cross-psds

Assuming loudspeaker signal and the near-end speech signal areuncorrelated (i.e. their cross-PSD is null Φ^(XS)=0), the following maybe written:Φ^(E) ^(p) ^(E) ^(p) =|G _(p)|²·Φ^(SS) +|{tilde over (H)}_(p)|²·Φ^(XX)  Eq. (8)Φ^(E) ^(s) ^(E) ^(s) =|G _(s)|²·Φ^(SS) +|{tilde over (H)}_(s)|²·Φ^(XX)  (n Eq. 9)Φ^(E) ^(p) ^(E) ^(s) =G _(p) ·G _(s)*Φ^(SS) +{tilde over (H)} _(p)·{tilde over (H)} _(s)*·Φ^(XX)  Eq. (10)where:

-   -   Φ^(E) ^(p) ^(E) ^(p) and Φ^(E) ^(s) ^(E) ^(s) represent the        auto-PSD and Φ^(E) ^(p) ^(E) ^(s) is the cross-PSD of the error        signals    -   Φ^(SS) and Φ^(XX) respectively represent the near-end speech        signal and the loudspeaker auto-PSDs.

Two RTFs Γ and Θ may be defined as follow:

$\begin{matrix}{{\Gamma = \frac{{\overset{\sim}{H}}_{s}}{{\overset{\sim}{H}}_{p}}},{\Theta = {\frac{G_{s}}{G_{p}}.}}} & {{Eq}.\mspace{14mu}(11)}\end{matrix}$

Rewriting Eqs. (8) to (10) with the above notations, the following maybe obtained:Φ^(E) ^(p) ^(E) ^(p) =|G _(p)|²·Φ^(SS) +|{tilde over (H)}_(p)|²·Φ^(XX)  Eq. (12)Φ^(E) ^(s) ^(E) ^(s) =|Θ·G _(p)|²·Φ^(SS) +|Γ·{tilde over (H)}_(p)|²·Φ^(XX)  Eq. (13)Φ^(E) ^(p) ^(E) ^(s) =Θ*·|G _(p)|²·Φ^(SS) +Γ*|{tilde over (H)}_(p)|²·Φ^(XX)  Eq. (14)

From Eqs. (12) to (14), new estimates of the residual echo and near-endPSDs may be deduced as follow:

$\begin{matrix}{\Phi^{{\overset{\sim}{D}}_{p}{\overset{\sim}{D}}_{p}} = \frac{{{\Phi }^{2} \cdot \Phi^{E_{p}E_{p}}} - \Phi^{E_{s}E_{s}}}{{\Theta }^{2} - {\Gamma }^{2}}} & {{Eq}.\mspace{14mu}(15)} \\{\Phi^{S_{p}S_{p}} = \frac{\Phi^{E_{s}E_{s}} - {{\Gamma }^{2} \cdot \Phi^{E_{p}E_{p}}}}{{\Theta }^{2} - {\Gamma }^{2}}} & {{Eq}.\mspace{14mu}(16)}\end{matrix}$

Another set of PSDs estimate may be derived by taking into account theerror signals cross-PSD Φ^(E) ^(p) ^(E) ^(s) :

$\begin{matrix}{\Phi^{{\overset{\sim}{D}}_{p}{\overset{\sim}{D}}_{p}} = \frac{{{\Theta }^{2} \cdot \Phi^{E_{p}E_{p}}} + \Phi^{E_{s}E_{s}} - {{2 \cdot {Re}}\left\{ {\Theta \cdot \Phi^{E_{p}E_{s}}} \right\}}}{{{\Theta - \Gamma}}^{2}}} & {{Eq}.\mspace{14mu}(17)} \\{\Phi^{S_{p}S_{p}} = \frac{{{\Gamma }^{2} \cdot \Phi^{E_{p}E_{p}}} + \Phi^{E_{s}E_{s}} - {{2 \cdot {Re}}\left\{ {\Gamma \cdot \Phi^{E_{p}E_{s}}} \right\}}}{{{\Theta - \Gamma}}^{2}}} & {{Eq}.\mspace{14mu}(18)}\end{matrix}$

Two sets of PSDs estimates may be used to compute echo postfilter gainsin case of DC echo processing. In either case (i.e. the set of Eqs. (15)and (16), or the set of Eqs. (17) and (18)), the computation ofΦ^({tilde over (D)}) ^(p) ^({tilde over (D)}) ^(p) and Φ^(S) ^(p) ^(S)^(p) requires the knowledge of the RTFs Γ and Θ which are unknown in areal time system and therefore need to be estimated. It should beunderstood and appreciated that the set of Eqs. (17) and (18) requiresthe modulus and phase of the different RTFs while the set of Eqs. (15)and (16) only requires the modulus of the different RTFs. Phasemodification in speech processing should be handled with care as it mayeasily introduce distortion. For this reason, the use of the set of Eqs.(15) and (16) for DC echo postfiltering may be altogether avoided.

The RTFs may need to be estimated. Methods to estimate RTF may includecross-spectral method, mean square or least square error minimization.

(a) Near-end speech acoustic paths RTF estimation

The near-end speech acoustic paths Θ is defined as:

$\begin{matrix}{\Theta = {\frac{G_{s}}{G_{p}}.}} & {{Eq}.\mspace{14mu}(19)}\end{matrix}$

Θ may also be interpreted as a gain such that:S _(s) =Θ·S _(p)  Eq. (20)

Considering near-end only speech activity period (i.e.E_(p)=S_(p)=G_(p)·S and E_(s)=S_(s)=G_(s)·S), an estimate {circumflexover (Θ)} of Θ may be obtained through mean square error (MSE) or leastsquare error (LSE) minimization.

The minimum MSE (MMSE) criteria used for the derivation of the MMSEestimate of {circumflex over (Θ)} is:

$\begin{matrix}{{\hat{\Theta}}_{MMSE} = {{{\underset{\hat{\Theta}}{argmin}\left( {{S_{s} - {\hat{S}}_{s}}}^{2} \right)}\mspace{14mu}{with}\mspace{14mu}{\hat{S}}_{s}} = {\hat{\Theta} \cdot {S_{p}.}}}} & {{Eq}.\mspace{14mu}(21)}\end{matrix}$

The MMSE estimate of {circumflex over (Θ)} is then given by

$\begin{matrix}{{\hat{\Theta}}_{MMSE} = {\frac{\Phi^{S_{p}S_{s}}}{\Phi^{S_{s}S_{s}}} = {\frac{\Phi^{E_{p}E_{s}}}{\Phi^{E_{s}E_{s}}}.}}} & {{Eq}.\mspace{14mu}(22)}\end{matrix}$

Another estimate in the form of an adaptive filter may be derived fromEq (23) below. In this case, one has many choices for the adaptivefilter, for example, LMS, NLMS or FBLMS. It should be understood that asthe minimization criteria (Eq (23)) is in the frequency domain using LMSor NLMS may lead to an estimate in the frequency domain. The NLMSsolution, which proves to be a relatively stable and robust, is asfollow:

$\begin{matrix}{{{\hat{\Theta}}_{NLMS}\left( {{k + 1},i} \right)} = {{{\hat{\Theta}}_{NLMS}\left( {k,i} \right)} + {\mu\;\frac{E_{p}\left( {k,i} \right)}{{{E_{p}\left( {k,i} \right)}}^{2}}{\overset{\sim}{e}\left( {k,i} \right)}}}} & {{Eq}.\mspace{11mu}(23)}\end{matrix}$where:

-   -   {tilde over        (e)}(k,i)=E_(s)(k,i)−Ê_(s)(k,i)=E_(s)(k,i)−{circumflex over        (Θ)}_(NLMS)(k,i)·E_(s)(k,i) is the error signal    -   μ is the stepsize which may be set to a fixed value for the        simplicity purposes.

The LSE minimization may also be used to estimate the near-end RTF{circumflex over (Θ)}. The LSE estimate of Θ expresses as follows:

$\begin{matrix}{{\hat{\Theta}}_{LSE} = {\frac{\left\langle {\Phi^{E_{p}E_{p}}\Phi^{E_{p}E_{s}}} \right\rangle - {\left\langle \Phi^{E_{p}E_{p}} \right\rangle\left\langle \Phi^{E_{p}E_{s}} \right\rangle}}{\left\langle \left( \Phi^{E_{p}E_{p}} \right)^{2} \right\rangle - \left\langle \Phi^{E_{p}E_{p}} \right\rangle^{2}}.}} & {{Eq}.\mspace{14mu}(24)}\end{matrix}$where

$\left\langle \beta \right\rangle = {\left\langle {\beta\left( {k,i} \right)} \right\rangle = {\frac{1}{K}{\sum\limits_{k = 1}^{K}{\beta\left( {k,i} \right)}}}}$given a set of K measures of β along time.

Details about the derivation of {circumflex over (Θ)}_(LSE) arepresented later on. In any of the cases ({circumflex over (Θ)}_(MMSE),{circumflex over (Θ)}_(NLMS) or {circumflex over (Θ)}_(LSE)), the updatemay be performed during near-end only activity period.

An activity detection on the loudspeaker may be detected near-end onlyactivity periods. For example, the activity detection may be achieved byapplying a threshold on the loudspeaker and microphone signals energies.The threshold on the loudspeaker energy may avoid adaptation duringfar-end activity periods whereas the threshold on the microphone signalsmay avoid adaptation during near-end silence period or on low amplitudemicrophone signal.

(b) Echo paths RTF estimation

Γ is defined as the ratio between the primary and the secondary residualecho paths:

$\begin{matrix}{\Gamma = \frac{{\overset{\sim}{H}}_{s}}{{\overset{\sim}{H}}_{p}}} & {{Eq}.\mspace{14mu}(25)}\end{matrix}$

Similarly to Θ in Eq. (19) and Eq. (20), Γ defines the link between theresidual echo of primary and secondary microphone in the followingmanner:{tilde over (D)} _(s) =Γ·{tilde over (D)} _(p)  Eq. (26)

Introducing Eq. (26) in Eqs. (6) and (7) respectively, the following maybe obtained:E _(p) =G _(p) ·S+{tilde over (H)} _(p) ·X=S _(p) +{tilde over (D)}_(p)  Eq. (27)E _(s) =G _(s) ·S+{tilde over (H)} _(s) ·X=S _(s) +Γ·{tilde over (D)}_(s)  Eq. (28)

Using the fact that {tilde over (D)}_(s) and {tilde over (D)}_(p) areboth generated by the loudspeaker signal x(n), Γ may be estimatedthrough the cross-correlation. Assuming independence of loudspeaker andnear-end speech signals (i.e. Φ^(XS)=0), the cross-correlation estimatorof Γ expresses as follows:

$\begin{matrix}{{\hat{\Gamma}}_{CC} = \frac{\Phi^{{XE}_{s}}}{\Phi^{{XE}_{p}}}} & {{Eq}.\mspace{14mu}(29)}\end{matrix}$where Φ^(XE) ^(p) and Φ^(XE) ^(s) are the cross-correlation between theloudspeaker and error signals on the primary and secondary microphonerespectively and express as follows:Φ^(E) ^(p) ^(X) ={tilde over (H)} _(p)·Φ^(XX)Φ^(E) ^(s) ^(X) ={tilde over (H)} _(s)·Φ^(XX)  Eq. (30)

The least square may also be used to derive an estimate of the echo RTFΓ. In this case the minimization criterion writes as follows:

$\begin{matrix}{\hat{\Gamma} = {\underset{\hat{\Gamma}}{argmin}\left( {\sum{{{\overset{\sim}{D}}_{s} - {\overset{\hat{\sim}}{D}}_{s}}}} \right)}^{2}} & {{Eq}.\mspace{14mu}(31)}\end{matrix}$

The LS estimate of Γ expresses as follows:

$\begin{matrix}{{\hat{\Gamma}}_{LS} = \frac{\left\langle {\Phi^{E_{s}X}\Phi^{E_{p}X}} \right\rangle}{\left\langle \left( \Phi^{E_{p}X} \right)^{2} \right\rangle}} & {{Eq}.\mspace{14mu}(32)}\end{matrix}$

The derivation of the LS estimate of Γ is presented below. It is notedthat {circumflex over (Γ)}_(LS) matches {circumflex over (Γ)}_(CC) ifonly one frame is consider for the least square criterion minimization.

PSD estimates involving multi-microphones may be provided. Acommunication terminal equipped with one loudspeaker and M microphonesmay be considered. Each microphone records both the echo signal which isgenerated by the loudspeaker, and the near-end speech signal. The signalon the m^(th) microphone signal may be written as follows:y _(m)(n)=g _(m)(n)*s(n)+h _(m)(n)*x(n)  Eq. (33)where

-   -   y_(m)(n) is the signal picked up by the m^(th) microphone signal    -   h_(m)(n) is the acoustic path between the loudspeaker and m^(th)        microphone signal    -   g_(m)(n) is the acoustic path between the near-end speaker and        m^(th) microphone signal.

As for the dual-channel case discussed above, an adaptive filter may beused to estimate the echo signal picked up by the m^(th) microphone.Therefore, the multi-channel postfilter may take the loudspeaker signaland the microphone and/or error for microphone paths using an adaptivefilter signals as inputs. Furthermore, the multi-channel information mayonly be used in the computation of the echo suppression while echosuppression itself may take place on the m^(th) microphone path whichhas an adaptive filter

FIG. 10 shows a circuit 1000 including a device 1001 for echopostfiltering and a system 1002 having transducers (i.e. a loudspeaker1003 and multi-channel microphones, for example two microphones 1004,1005), which are used to compute the echo PSD estimate 1006, 1007 of them^(th) microphone path. The dots 1008, 1009 on the respectivemicrophones 1004, 1005 may account for the presence of adaptive filter(not shown in FIG. 10) which may possibly be used before the echopostfilter device 1001. The error signal 1006, 1007 may be used tocompute the echo suppression gain that is applied on the m^(th)microphone signal, for example y^(1|2)(n) 1010, 1011, to obtain anestimate of the near-end speech received by the respective microphones1004, 1005. An estimate of the near-end speech §(n) 1012 may besynthetized by a beamformer 1013.

The device 1001 may include for each receiving path, an echo PSD andgain update block 1014, 1015 and an echo postfiltering block 1016, 1017prior to the beamformer 1013.

The loudspeaker signal x(n) 1018 of the loudspeaker 1003 may refer tothe audio signal output 301; y¹(n) 1010 may refer to the first receivingpath 302; y²(n) 1011 may refer to the second receiving path 303; theecho PSD and gain update blocks 1014, 1015 may refer to the determiner304; and the echo postfiltering blocks 1016, 1017 may refer to the echosuppression filter 305.

The loudspeaker signal x(n) 1018 of the loudspeaker 1003 may refer tothe audio signal output 401; y^(1|2)(n) 1010, 1011 may refer to theplurality of receiving paths 402; the echo PSD and gain update blocks1014, 1015 may refer to the determiner 403; and the echo postfilteringblocks 1016, 1017 may refer to the echo suppression filter 404.

Another device 1100 for multi-channel echo postfiltering may be providedas illustrated in FIG. 11.

FIG. 11 shows a circuit 1101 including the device 1100. The circuit 1101may be similarly referred to as in the circuit 1000 of FIG. 10.

The dots 1102, 1103 on the microphone path 1104, 1105 encounters for thepresence of adaptive filter (not shown in FIG. 11) which may possibly beused before the echo postfilter device 1100.

As compared to the device 1001 of FIG. 10, the device 1100 of FIG. 11may include the beamforming (1) 1106 which may be used to steer theinputs signals towards the direction of the echo signal. This means thatthe signal at the output of this block should be composed of echo only.However, as beamformers have limited performance, part of the near-endspeech signal may be present at the output of beamforming (1) 1106. Thebeamforming (2) 1107 may have the same objective as the beamforming (1)1106 except it steers the multi-channel signals towards the direction ofthe near-end signals. For the same reason as for the beamforming (1)1106, some echo may be present at the output of the beamforming (2) 1107block. A dual-channel postfilter 1108 may include an echo PSD and gainupdate block 1109 and an echo postfiltering block 1110. The dual-channelpostfilter 1108 may be used to further reduce the echo present at theoutput of beamforming (2) 1107 block.

FIG. 11 can be seen to be based on the fact that any multichannel echocancellation may reduce to a dual-channel echo suppression solution.When such scheme is used for a terminal with M microphones, then the Mmicrophone signals are given as input to the two beamformers 1103, 1104which are used to estimate the echo or the near-end signals. Thesebeamforming outputs may then be used as input to the echo PSD and gainupdate block 1105.

For the scheme illustrated in FIGS. 10 and 11 and similarly to PSDsestimates derived for FIGS. 8 and 9 (which represent the dual microphonecase), echo and near-end PSDs for the m^(th) microphone path may beestimated as follows:

$\begin{matrix}{{\Phi_{i}^{D_{m}D_{m}}(n)} = \frac{\begin{matrix}{\overset{M}{\sum\limits_{{k = 1},{k \neq m}}}{\left( {{{\Theta_{i}^{m,k}(n)}}^{2} - {{\Gamma_{i}^{m,k}(n)}}^{2}} \right) \cdot}} \\\left( {{{{\Theta_{i}^{m,k}(n)}}^{2} \cdot {\Phi_{i}^{Z^{m}Z^{m}}(n)}} - {\Phi_{i}^{Z^{k}Z^{k}}(n)}} \right)\end{matrix}}{\overset{M}{\sum\limits_{{k = 1},{k \neq m}}}\left( {{{\Theta_{i}^{m,k}(n)}}^{2} - {{\Gamma_{i}^{m,k}(n)}}^{2}} \right)^{2}}} & {{Eq}.\mspace{14mu}(34)} \\{{\Phi_{i}^{S_{m}S_{m}}(n)} = \frac{\begin{matrix}{\overset{M}{\sum\limits_{{k = 1},{k \neq m}}}{\left( {{{\Theta_{i}^{m,k}(n)}}^{2} - {{\Gamma_{i}^{m,k}(n)}}^{2}} \right) \cdot}} \\\left( {{\Phi_{i}^{Z^{k}Z^{k}}(n)} - {{{\Gamma_{i}^{m,k}(n)}}^{2} \cdot {\Phi_{i}^{Z^{m}Z^{m}}(n)}}} \right)\end{matrix}}{\sum\limits_{{k = 1},{k \neq m}}^{M}\left( {{{\Theta_{i}^{m,k}(n)}}^{2} - {{\Gamma_{i}^{m,k}(n)}}^{2}} \right)^{2}}} & {{Eq}.\mspace{14mu}(35)}\end{matrix}$where:

-   -   Φ_(i) ^(Z) ^(m) ^(Z) ^(m) (n) represents the auto-PSD of        z^(m)(n) which is either equal to e^(m)(n) if an adaptive filter        is used for the m^(th) microphone or to y_(m)(n) if no adaptive        filter is used,    -   Γ_(i) ^(m,k)(n) and Θ_(i) ^(m,k)(n) are the (residual) echo and        near-end speech relative transfer functions for the k^(th)        microphone when computing PSDs estimate for the m^(th)        microphone.    -   Φ_(i) ^(D) ^(m) ^(D) ^(m) (n) is the (residual) echo PSD on the        m^(th) microphone and is required for the computation of the        echo suppression gain of the m^(th) microphone    -   Φ_(i) ^(S) ^(m) ^(S) ^(m) (n) is near-end speech signal PSD on        the m^(th) microphone.

The relative transfer function may be defined as follow:

$\begin{matrix}{{{\Gamma_{i}^{m,k}(n)} = {\frac{{\overset{\sim}{H}}_{i}^{m}(n)}{{\overset{\sim}{H}}_{i}^{k}(n)} = {\frac{{H_{i}^{m}(n)} - {{\hat{H}}_{i}^{m}(n)}}{{H_{i}^{k}(n)} - {{\hat{H}}_{i}^{k}(n)}}\mspace{14mu}{and}}}}{{\Theta_{i}^{m,k}(n)} = {\frac{G_{i}^{m}(n)}{G_{i}^{k}(n)}.}}} & {{Eq}.\mspace{14mu}(36)}\end{matrix}$

With Ĥ_(i) ^(m)(n) equal to 0 in case no adaptive filter is used on them^(th) microphone path.

The function of postfilter may be expressed as follows:

$\begin{matrix}{W = \frac{\Phi^{S_{m}S_{m}}}{\Phi^{S_{m}S_{m}} + \Phi^{D_{m}D_{m}}}} & {{Eq}.\mspace{14mu}(37)} \\{W = {\frac{SER}{1 + {SER}}.}} & {{Eq}.\mspace{14mu}(38)}\end{matrix}$

The previous equations (Eqs. (37) and (38)) show that the computation ofthe postfilter requires an estimation of the echo PSD on the m^(th)microphone Φ^(D) ^(m) ^(D) ^(m) and/or near-end PSD on the m^(th)microphone Φ^(S) ^(m) ^(S) ^(m) . However any kind of gain rule may beused that uses or requires an estimate of the near-end speech PSD and ofthe residual echo PSD.

In the derivation of the multi-channel PSDs estimate that follows, noadaptive filter is assumed for use on the microphone paths (it is to benoted that this assumption is not limiting and is only made for reasonsof simplicity of the explanation). This implies that input microphonesignals are y_(m)(n). Given the microphone observation y_(m)(n), itsFourier transform may be written as follows:Y _(m) =G _(m) ·S+H _(m) ·X  Eq. (39)

In the example above, two different estimates for the residual echo andnear-end PSDs for the dual-channel terminals are discussed as in Eqs.(17) and (18). Although the use of the estimates in Eqs. (17) and (18)involves phase information which is delicate to handle in speechprocessing, multi-channel echo and near-end PSD estimates that matchesboth formalism (the set of Eqs. (15) and (16) and the set of Eqs. (17)and (18)) are presented in the following.

PSD estimates match Eqs. (15) and (16) for M=2. Assuming loudspeakersignal and the near-end speech signal are uncorrelated (i.e. theircross-PSD is null Φ^(XS)=0), the l^(th) microphone auto-PSD expresses asfollows:Φ^(Y) ^(l) ^(Y) ^(l) =Φ^(S) ^(l) ^(S) ^(l) +Φ^(D) ^(l) ^(D) ^(l) =|G_(l)|²·Φ^(SS) +|H _(l)|²·Φ^(XX)  Eq. (40)where l is the microphone channel index ranging from 1 to M.

By introducing the RTF of Eq. (41)

$\begin{matrix}{{\Gamma^{m,l} = \frac{H_{l}}{H_{m}}},{\Theta^{m,l} = \frac{G_{l}}{G_{m}}}} & {{Eq}.\mspace{14mu}(41)}\end{matrix}$into Eq. (40), the following may be obtained:Φ^(Y) ^(l) ^(Y) ^(l) =|Γ^(m,l)|²·Φ^(D) ^(m) ^(D) ^(m) +|Θ^(m,l)|²·Φ^(S)^(m) ^(S) ^(m) .  Eq. (42)

Eq. (42) shows that the l^(th) microphone auto-PSD may be written as afunction of the echo signal and near-end signal PSDs of the m^(th)microphone, i.e. Φ^(D) ^(m) ^(D) ^(m) and Φ^(S) ^(m) ^(S) ^(m) to beestimated.

Considering all the M microphone signals, Eq. (42) may equivalently bewritten in a matrix form as follows:

$\begin{matrix}{\begin{bmatrix}\Phi^{Y_{1}Y_{1}} \\\Phi^{Y_{2}Y_{2}} \\\vdots \\\Phi^{Y_{M}Y_{M}}\end{bmatrix} = {\begin{bmatrix}{\Gamma^{m,1}}^{2} & {\Theta^{m,1}}^{2} \\{\Gamma^{m,2}}^{2} & {\Theta^{m,2}}^{2} \\\vdots & \vdots \\{\Gamma^{m,M}}^{2} & {\Theta^{m,M}}^{2}\end{bmatrix} \cdot \begin{bmatrix}\Phi^{D_{m}D_{m}} \\\Phi^{S_{m}S_{m}}\end{bmatrix}}} & {{Eq}.\mspace{14mu}(43)} \\{Z = {A \cdot V}} & {{Eq}.\mspace{14mu}(44)}\end{matrix}$With the following notations

${Z = \begin{bmatrix}\Phi^{Y_{1}Y_{1}} & \Phi^{Y_{2}Y_{2}} & \ldots & \Phi^{Y_{M}Y_{M}}\end{bmatrix}^{T}},{A = \begin{bmatrix}{\Gamma^{m,1}}^{2} & {\Theta^{m,1}}^{2} \\{\Gamma^{m,2}}^{2} & {\Theta^{m,2}}^{2} \\\vdots & \vdots \\{\Gamma^{m,M}}^{2} & {\Theta^{m,M}}^{2}\end{bmatrix}},{V = \begin{bmatrix}\Phi^{D_{m}D_{m}} \\\Phi^{S_{m}S_{m}}\end{bmatrix}}$which represents the PSDs required for the computation of theechosuppression gains for the m^(th) microphone path.

From Eq. (44), an estimate of V may be derived as:{circumflex over (V)}=(A ^(H) A)⁻¹ A ^(H) Z.  Eq. (45)

The expansion of Eq. (45) leads to the following echo and near-end PSDestimates:

$\begin{matrix}{{\hat{\Phi}}^{D_{m}D_{m}} = \frac{\sum\limits_{{l = 1},{l \neq m}}^{M}{\left( {{\Theta^{m,l}}^{2} - {\Gamma^{m,l}}^{2}} \right) \cdot \left( {{{\Theta^{m,l}}^{2} \cdot \Phi^{Y_{m}Y_{m}}} - \Phi^{Y_{l}Y_{l}}} \right)}}{\sum\limits_{{l = 1},{l \neq m}}^{M}\left( {{\Theta^{m,l}}^{2} - {\Gamma^{m,l}}^{2}} \right)^{2}}} & {{Eq}.\mspace{14mu}(46)} \\{{\hat{\Phi}}^{S_{m}S_{m}} = \frac{\sum\limits_{{l = 1},{l \neq m}}^{M}{\left( {{\Theta^{m,l}}^{2} - {\Gamma^{m,l}}^{2}} \right) \cdot \left( {\Phi^{Y_{l}Y_{l}} - {{\Gamma^{m,l}}^{2} \cdot \Phi^{Y_{m}Y_{m}}}} \right)}}{\sum\limits_{{l = 1},{l \neq m}}^{M}\left( {{\Theta^{m,l}}^{2} - {\Gamma^{m,l}}^{2}} \right)^{2}}} & {{Eq}.\mspace{14mu}(47)}\end{matrix}$

PSD estimates match Eqs. (17) and (18) for M=2.

By introducing the RTFs defined in Eq. (41) in Eq. (39), the followingmay be written:Y _(l)=Γ^(m,l) ·H _(m) ·X+Θ ^(m,l) ·G _(m) ·S=Γ ^(m,l) ·D _(m)+Θ^(m,l)·S _(m)  Eq. (48)

Eq. (48) shows that the l^(th) microphone signal may be written as afunction of the echo signal and near-end signals received by the m^(th)microphone channel.

Considering all the M microphone observations, Eq. (48) may equivalentlybe written in a matrix form as follows:

$\begin{matrix}{\begin{bmatrix}Y_{1} \\Y_{2} \\\vdots \\Y_{M}\end{bmatrix} = {\begin{bmatrix}\Gamma^{m,1} & \Theta^{m,1} \\\Gamma^{m,2} & \Theta^{m,2} \\\vdots & \vdots \\\Gamma^{m,M} & \Theta^{m,M}\end{bmatrix} \cdot \begin{bmatrix}D_{m} \\S_{m}\end{bmatrix}}} & {{Eq}.\mspace{14mu}(49)} \\{Y = {A \cdot V}} & {{Eq}.\mspace{14mu}(50)}\end{matrix}$

From Eq. (49), the microphone PSD matrix may be computed as follows:Φ^(YY) =AΦ ^(VV) A ^(H)  Eq. (51)Where

-   -   Φ^(YY)=Y·Y^(H) is an estimate of the microphone power spectrum        matrix

$\Phi^{YY} = {{V \cdot V^{H}} = \begin{bmatrix}\Phi^{D_{m}D_{m}} & \Phi^{D_{m}S} \\\Phi^{D_{m}S^{*}} & \Phi^{S_{m}S_{m}}\end{bmatrix}}$

-   -    contains the PSDs of interest Φ^(D) ^(m) ^(D) ^(m) and Φ^(S)        ^(m) ^(S) ^(m) .

An estimate of Φ^(VV) is given by{circumflex over (Φ)}^(VV)=(A ^(H) A)⁻¹ A ^(H)Φ^(YY) A(A ^(H) A)⁻¹  Eq.(52)

The expansion of Eq. (52) leads to the following echo and near-end PSDestimates:

$\begin{matrix}{\Phi^{D_{m}D_{m}} = \frac{\sum\limits_{{l = 1},{l \neq m}}^{M}{{{\Theta^{m,l} - \Gamma^{m,l}}}^{2}\left( {{{\Theta^{m,l}}^{2}\Phi^{Y_{m}Y_{m}}} + \Phi^{Y_{l}Y_{l}} - {{Re}\left\{ {\Theta^{m,l}\Phi^{Y_{m}Y_{l}}} \right\}}} \right)}}{\left( {\sum\limits_{{l = 1},{l \neq m}}^{M}{{\Theta^{m,l} - \Gamma^{m,l}}}^{2}} \right)^{2}}} & {{Eq}.\mspace{14mu}(53)} \\{\Phi^{S_{m}S_{m}} = \frac{\sum\limits_{{l = 1},{l \neq m}}^{M}{{{\Theta^{m,l} - \Gamma^{m,l}}}^{2}\left( {{{\Gamma^{m,l}}^{2}\Phi^{Y_{m}Y_{m}}} + \Phi^{Y_{l}Y_{l}} - {2{Re}\left\{ {\Gamma^{m,l}\Phi^{Y_{m}Y_{l}}} \right\}}} \right)}}{\left( {\sum\limits_{{l = 1},{l \neq m}}^{M}{{\Theta^{m,l} - \Gamma^{m,l}}}^{2}} \right)^{2}}} & {{Eq}.\mspace{14mu}(54)}\end{matrix}$

The PSD estimates may require the knowledge of the microphone signalsauto-PSD. In a real time implementation, an estimate of the microphonesignals may be obtained through auto-regressive smoothing.

The PSD estimates may be valid for the case where none of the microphonesignals is processed by an adaptive filter before being used by thepostfilter.

In another example, the adaptive filter may be placed on some or on allof the microphone paths before the postfilter. The use of an adaptivefilter on the m^(th) microphone path means that in the above equationsy_(m)(n) becomes e_(m)(n) withe _(m)(n)=g _(m)(n)*s(n)+{tilde over (h)} _(m)(n)*x(n).  Eq. (55)

RTF Estimation may be provided. For example, least square estimate ofthe near-end RTF may be used. Assuming near-end only activity periodsand the presence of some local noise in the near-end acousticenvironment, the l^(th) microphone signal may be written as follows:y _(l)(n)=g _(l)(n)*s(n)+b _(l)(n),  Eq. (56)or equivalently in the frequency domain:Y _(l) =H _(l) ·X+B _(l)  Eq. (57)where b_(l)(n) represents the ambient noise received by the l^(th)microphone and B_(l) is its Fourier transform.

By introducing the near-end RTF definition as defined in Eq. (41) intoEq. (57), the following may be obtained:Y _(l)=Θ^(m,l) ·Y _(m) +B _(l)−Θ^(m,l) ·B _(m)=Θ^(m,l) ·Y _(m) +{tildeover (B)} _(l)  Eq. (58)

The least square estimate of the near-end RTF may be derived as follows:

$\begin{matrix}{{\hat{\Theta}}_{LS}^{m,l} = \frac{\left\langle {\Phi_{(r)}^{Y_{m}Y_{m}}\Phi_{(r)}^{Y_{l}Y_{m}}} \right\rangle - {\left\langle \Phi_{(r)}^{Y_{m}Y_{m}} \right\rangle\left\langle \Phi_{(r)}^{Y_{l}Y_{m}} \right\rangle}}{\left\langle \left( \Phi_{(r)}^{Y_{m}Y_{m}} \right)^{2} \right\rangle - \left\langle \Phi_{(r)}^{Y_{m}Y_{m}} \right\rangle^{2}}} & {{Eq}.\mspace{14mu}(59)}\end{matrix}$where

$\left\langle \beta_{(r)} \right\rangle = {\frac{1}{R}{\sum\limits_{k = 1}^{R}\beta_{(r)}}}$given a set of R measures of β along time.

Least square estimate of the Echo RTF may be provided. Assuming far-endonly activity periods and the presence of some local noise in thenear-end acoustic environment, the l^(th) microphone signal may bewritten as follows:y _(l)(n)=h _(l)(n)*x(n)+b _(l)(n),  Eq. (60)or equivalently in the frequency domain:Y _(l) =H _(l) ·X+B _(l)  Eq. (61)

By introducing the echo RTF definition as defined in Eq. (60) into Eq.(61), the following may be obtained:Y _(l)=Γ^(m,l) ·Y _(m) +B _(l)−Γ^(m,l) ·B _(m)=Γ^(m,l) ·Y _(m) +{tildeover (B)} _(l)  Eq. (62)

The vector [X Y₁ . . . Y_(M)]^(T) and an observation window which may besubdivised into R frames in the time domain may be considered.Considering the echo RTF is stationary within the observation window,non-stationary of speech signals may be exploited from one frame toanother. For each frame r of the observation interval, the following PSDmay be written:Φ_((r)) ^(Y) ^(l) ^(X)=Γ^(m,l)·Φ_((r)) ^(Y) ^(m) ^(X)+Φ_((r))^({tilde over (B)}) ^(l) ^(X).  Eq. (63)

{tilde over (B)}_(l) is defined by the ambient noise in the near-endacoustic environment, therefore it may be assumed that it isstatistically independent from the loudspeaker (i.e. Φ_((r))^({tilde over (B)}) ^(l) ^(X)=0). The quantities Φ_((r)) ^(Y) ^(l) ^(X)and Φ_((r)) ^(XX) may be estimated from observations signals throughautoregressive smoothing for example. Considering the observationinterval of R frames, Eq. (63) may be written in a matrix form asfollows:

$\begin{matrix}{\begin{bmatrix}\Phi_{(1)}^{Y_{l}X} \\\Phi_{(2)}^{Y_{l}X} \\\vdots \\\Phi_{(R)}^{Y_{l}X}\end{bmatrix} = {\begin{bmatrix}\Phi_{(1)}^{Y_{m}X} \\\Phi_{(2)}^{Y_{m}X} \\\vdots \\\Phi_{(R)}^{Y_{m}X}\end{bmatrix} \cdot \left\lbrack \Gamma^{m,l} \right\rbrack}} & {{Eq}.\mspace{14mu}(64)} \\{Z = {A \cdot V}} & {{Eq}.\mspace{14mu}(65)}\end{matrix}$

Then the LS estimate of the echo RTF defines as follows:

$\left\langle \beta_{(r)} \right\rangle = {\frac{1}{R}{\sum\limits_{k = 1}^{R}\beta_{(r)}}}$and expresses as follows:

$\begin{matrix}{\hat{\Gamma} = {{\underset{\hat{\Gamma}}{\arg\;\min}\left( {\left( {Z - \hat{Z}} \right)^{H} \cdot \left( {Z - \hat{Z}} \right)} \right)\mspace{14mu}{with}\mspace{14mu}\hat{Z}} = {A \cdot \hat{V}}}} & {{Eq}.\mspace{14mu} 66}\end{matrix}$where

$\begin{matrix}\begin{matrix}{{\hat{\Gamma}}_{LS}^{m,l} = {{\left( {A^{H}A} \right)^{- 1}A^{H}Z} = \frac{\left\langle {\Phi_{(r)}^{Y_{l}X}\Phi_{(r)}^{Y_{m}X}} \right\rangle}{\left\langle \left( \Phi_{(r)}^{Y_{m}X} \right)^{2} \right\rangle}}} & \;\end{matrix} & {{Eq}.\mspace{14mu}(67)}\end{matrix}$given a set of R measures of β along time.

The performance of the dual microphone residual echo PSD estimatedescribed above may be assessed and compared against an existingestimate.

For example, the data recorded with the mock-up phone as previouslydiscussed may be used to generate a test database of speech signals. Themicrophone signals may contain both echo-only and double-talk periods.The signal-to-echo ratio (SER) may be set and measured for the primarymicrophone and the secondary microphone may be computed accordingly. TheSER may range from −5 dB to 10 dB. The dual channel (DC) echo processingmethod, for example, the method of FIGS. 1 and 2, may be compared to anexisting single channel (SC) echo processing method (i.e SC adaptivefilter followed by a postfilter). The SC echo processing may only usethe primary microphone. The adaptive filter considered may be anormalized least mean square adaptive filter with variable stepsize. Forthe DC and SC echo postfilters considered, the subband gains may becomputed with a Wiener rule with SER estimated through decision directedapproach. The DC and SC postfilters may differ by the residual echo PSDestimator.

The assessment of the postfilter for the dual channel (DC) echoprocessing method may be performed in two steps. On one side, the PSDestimator for the dual channel (DC) echo processing method may beassessed in terms of symmetric segmental logarithmic error which may beexpressed as follows:

$\begin{matrix}{{\log\;{Err}} = {\frac{1}{KM}{\sum\limits_{k}^{K}{\sum\limits_{i}^{M}{{10\;{\log_{10}\left\lbrack \frac{\Phi^{{\overset{\_}{d}}_{p}{\overset{\_}{d}}_{p}}\left( {k,i} \right)}{{\hat{\Phi}}^{{\overset{\_}{d}}_{p}{\overset{\_}{d}}_{p}}\left( {k,i} \right)} \right\rbrack}}}}}}} & {{Eq}.\mspace{14mu}(68)}\end{matrix}$where K represents the number of frames.

On the other side, the DC postfilter may be compared to the SCpostfilter in terms of echo return loss enhancement (ERLE), of speechattenuation (SA), of cepstral distance (CD) and of informal listeningtests. The ERLE may represent an amount of echo suppression achieved bythe adaptive filter and by the postfilter all together and may bemeasured during echo-only periods. The SA may be used to measure theamount of speech attenuation introduced by the postfilter on thenear-end speech signal during double-talk periods. The SA may bemeasured for the primary microphone as the attenuation between the cleanspeech s_(p)(n) 607 and a weighted speech signal s _(p)(n) as follows:

$\begin{matrix}{{SA} = {\frac{1}{??}{\sum\limits_{\lambda}^{??}{10\;\log_{10}\frac{\sum\limits_{l = 1}^{L}{s_{p}^{2}\left( {{\lambda\; L} + l} \right)}}{\sum\limits_{N}{{\overset{\_}{s}}_{p}^{2}\left( {{\lambda\; L} + l} \right)}}}}}} & {{Eq}.\mspace{14mu}(69)}\end{matrix}$where L is length of the frames on which may compute the segmental SAand

represents the number of frames during which double talk occurs.

The weighted speech signals s _(p)(n) may be obtained with any suitableexisting method. When processing degraded speech signals, the updatedspectral gains may be stored. These gains may be applied to the cleannear-end speech s_(p)(n) in the subband domain to obtain the weightedspeech signal s _(p)(n). The cepstral distance may be similarly measuredbetween s_(p)(n) and s _(p)(n). It should be appreciated that there maybe no need to assess the adaptive filtering part separately as the sameadaptive filter for DC and SC echo processing may be used.

For example, the number of subbands M may be set to 256 and the subbandconversion may take place through short term Fourier transform withoverlap add.

Residual echo PSD estimate assessment may be performed. FIG. 12(a) showsthe estimation error on the residual echo PSD during echo-only anddouble-talk periods. FIG. 12(a) shows that during echo-only period theDC estimate 1200 slightly outperforms the SC estimate 1201. For both theDC estimates 1200, 1202 and SC estimates 1201, 1203, the error decreasesas the SER increases although it may be observed that this decrease isgradual and very slow.

FIG. 12(a) further shows that during double-talk periods, the errorincreases with the SER. This may be explained by the presence ofnear-end speech signal which disturbs the PSD estimation. Moreover, highSERs may imply high near-end speech signal compared to echo (andtherefore residual echo) and therefore more disturbance of the residualecho estimators. From FIG. 12(a), the DC estimate can also be seen toachieve better performance than the SC estimate for low SERs. In thecontrary at high SER (SER>0 dB), the SC estimate may outperform the DCestimator. The loss of performance of the DC may be justified by thefact that during double-talk, the presence of near-end may disturb theestimate of the RTF Γ as the cross-PSDs used for its computation inpractice may contain a component dependent of the near-end speechsignal.

Residual echo suppression may be provided.

FIG. 12(b) shows the ERLE curves 1204, 1205 and SA curves 1206, 1207.The ERLE curves 1204, 1205 shows that the DC echo postfilter 1204achieves more echo suppression than the SC postfilter 1205. This may bea direct consequence of PSDs estimators accuracy during echo-onlyperiods. The SA curves 1206, 1207 show that the SA for the DC case 1206increases with the SER while it decreases for the SC case 1207. Suchincrease of the SA may be an undesirable effect. Nevertheless, the DCpostfilter may introduce less attenuation (up to 5 dB) of the near-endspeech compared to the SC postfilter which is a very significantdifference when dealing with echo cancellation

FIG. 12(c) shows the measure of cepstral distance during double talk.FIG. 12 (c) shows that at low SERs, the DC postfilter 1208 introducesless distortions than the SC postfilter 1209. At higher SERs, the SCpostfilter 1209 introduces less distortions. The measure of ceptraldistance may be a consequence of the PSD estimations errors.

Moreover, the DC postfilter 1208 may introduce less near-end speechattenuation during double-talk than the SC postfilter 1209. In the DCcase 1208, the speech attenuation increases with the SER while itdecreases for the SC case 1209. This difference of behaviour in thespeech attenuation may directly reflect on the cepstral distance and mayexplain its increase for the DC case 1208. Informal listening tests mayshow that the DC postfilter 1208 yields a slight better intelligibilitycompared to the SC postfilter 1209 during double-talk periods. The SAintroduced by the SC postfilter 1209 may be perceptible and maysometimes lead to complete suppression of the speech.

While the invention has been particularly shown and described withreference to specific aspects, it should be understood by those skilledin the art that various changes in form and detail may be made thereinwithout departing from the spirit and scope of the invention as definedby the appended claims. The scope of the invention is thus indicated bythe appended claims and all changes which come within the meaning andrange of equivalency of the claims are therefore intended to beembraced.

What is claimed is:
 1. A method for processing audio signals, the methodcomprising: outputting an audio signal; receiving the output audiosignal via a first receiving path as a first received audio signal;receiving the output audio signal via a second receiving path as asecond received audio signal; determining a residual echo power based onthe first received audio signal and the second received audio signal;determining an echo suppression gain based on the residual echo powerand an estimate of a signal-to-echo ratio of at least a selectedreceived audio signal from the first received audio signal and thesecond received audio signal; and filtering echo suppression of theaudio signal based on the first received audio signal and the echosuppression gain.
 2. The method of claim 1, wherein determining an echosuppression gain based on the first audio signal comprises: filteringecho of the first audio signal to produce a first echo error signal;determining the echo suppression gain based on the first echo errorsignal; and filtering echo suppression of the audio signal based on thefirst echo error signal.
 3. The method of claim 1, wherein determiningan echo suppression gain based on the second audio signal comprises:filtering echo of the second audio signal to produce a second echo errorsignal; and determining the echo suppression gain based on the secondecho error signal.
 4. The method of claim 2, wherein the filtering ofecho comprises an adaptive echo filtering.
 5. The method of claim 1,wherein the filtering echo suppression of the audio signal includesignoring the second received audio signal.
 6. The method of claim 1,wherein outputting an audio signal comprises outputting an audio signalvia a loudspeaker.
 7. The method of claim 1, wherein receiving the audiosignal via a first receiving path as a first audio signal comprisesreceiving the audio signal via a first microphone; and wherein receivingthe output audio signal via a second receiving path as a second receivedaudio signal comprises receiving the output audio signal via a secondmicrophone.
 8. The method of claim 1, further comprising: beamforming atleast one of the audio signal and the audio signal after echosuppression.
 9. A method for processing audio signals, the methodcomprising: outputting an audio signal; determining an echo suppressiongain based on a multichannel audio signal information representingreceived audio signals which are received via different receiving paths;wherein determining the echo suppression gain is based on an estimate ofat least a signal-to-echo ratio of a selected set of received audiosignals of the received audio signals and a second echo error signal ofthe received signal not selected for the estimate of the signal-to-echoratio; and filtering echo suppression of the audio signal based onsingle-channel audio signal information representing the received audiosignal received via a single receiving path and the determined echosuppression gain.
 10. The method of claim 9, further comprising: whereinthe multichannel audio signal information comprises echo filteringinformation of the received audio signals.
 11. The method of claim 9,further comprising: wherein the multichannel audio signal informationcomprises multichannel echo filtering information of the received audiosignals.
 12. The method of claim 9, further comprising: wherein thefiltering of echo comprises an adaptive echo filtering.
 13. The methodof claim 9, wherein outputting an audio signal comprises outputting anaudio signal via a loudspeaker.
 14. The method of claim 9, wherein thereceived audio signals are received via at least a first microphone anda second microphone.
 15. The method of claim 9, further comprising:beamforming at least one of the audio signal and the audio signal afterecho suppression.
 16. A circuit arrangement for processing audio signalscomprising: an audio device for outputting an audio signal; a firstreceiving path configured to receive the output audio signal as a firstreceived audio signal; a second receiving path configured to receive theoutput audio signal as a second received audio signal; a determinercircuit configured to determine a residual echo power based on the firstreceived audio signal and the second received audio signal; wherein thedeterminer circuit is further configured to determine an echosuppression gain based on the residual echo power and an estimate of atleast a signal-to-echo ratio of a selected received audio signal fromthe first received audio signal and the second received audio signal;and an echo suppression filter coupled to the first receiving path andthe determiner circuit configured to suppress echo of the audio signalbased on the first received audio signal and the echo suppression gain.17. The circuit arrangement of claim 16, further comprising: at leastone echo filter configured to filter the first received audio signal toproduce a first echo error signal; the determiner circuit configured todetermine the echo suppression gain based on the first echo errorsignal; and the echo suppression filter configured to perform echosuppression of the audio signal based on the first echo error signal.18. The circuit arrangement of claim 17, further comprising: at leastone echo filter configured to filter the second audio signal to producea second echo error signal; the determiner circuit configured todetermine the echo suppression gain based on the second echo errorsignal.
 19. The circuit arrangement of claim 17, wherein the at leastone echo filter comprises an adaptive echo filter.
 20. The circuitarrangement of claim 16, the echo suppression filter configured toignore the second received audio signal when filtering the audio signalbased on the first echo error signal.
 21. The circuit arrangement ofclaim 16, further comprising: a loudspeaker connected to the audiosignal output.
 22. The circuit arrangement of claim 16, furthercomprising: a first microphone connected to the first receiving path;and a second microphone connected to the second receiving path.
 23. Thecircuit arrangement of claim 16, further comprising: a beamformerconfigured to beamform the audio signals or the echo suppressionfiltered audio signal or both.
 24. A circuit arrangement for processingaudio signals, the circuit arrangement comprising: an audio device foroutputting an audio signal; a plurality of receiving paths coupled tothe audio signal output; a determiner circuit configured to determine aresidual echo power based on a plurality of received audio signals;wherein the determiner circuit is coupled to the plurality of receivingpaths and further configured to determine an echo suppression gain basedon at least a selected plurality of received audio signals of thereceived audio signals; wherein the echo suppression gain is based onthe residual echo power, an estimate of a signal-to-echo ratio of thefirst received audio signal, and a second echo error signal of thereceived signal not selected for the estimate of the signal-to-echoratio; and an echo suppression filter coupled to at least one of theplurality of receiving paths and the determiner circuit and configuredto filter the audio signal based on a single-channel audio signalinformation representing a received audio signal received via a singlereceiving path and the echo suppression gain.
 25. The circuitarrangement of claim 24, wherein the multichannel audio signalinformation comprises echo filtering information of the plurality ofaudio signals.
 26. The circuit arrangement of claim 24, wherein themultichannel audio signal information comprises multichannel echofiltering information of the plurality of audio signals.
 27. The circuitarrangement of claim 24, wherein the at least one echo filter comprisesan adaptive echo filter.
 28. The circuit arrangement of claim 24,further comprising: a loudspeaker connected to the audio signal output.29. The circuit arrangement of claim 24, further comprising: a firstmicrophone connected to one receiving path of the plurality of receivingpaths; and a second microphone connected to another receiving path ofthe plurality of receiving paths.
 30. The circuit arrangement of claim24, further comprising: a beamformer configured to beamform the audiosignals or the echo suppression filtered audio signal or both.
 31. Themethod of claim 1, wherein determining echo suppression gain is alsobased on a second echo error signal of the received signal not selectedfor the estimate of the signal-to-echo ratio.